Skip to content

Store dist manifest in JSON to improve load performance #4344

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

Kobzol
Copy link
Contributor

@Kobzol Kobzol commented May 22, 2025

Currently, the DIST_MANIFEST (multirust-channel-manifest) is being stored in TOML, which is not great for performance. The file has ~1 MiB on disk, and loading it takes ~24ms on my notebook. That's more than half the runtime of invoking rustc --version through rustup.

Since the manifest is generated and read only by rustup (AFAIK), I think that we could use a faster format to reduce this bottleneck. In this PR I chose to use JSON, for several reasons:

  • It is a very stable format that is unlikely to introduce breaking changes in the near future.
  • serde_json is well maintained and unlikely to be abandoned, and there are many other JSON (de)serialization alternatives.
  • JSON is more or less human readable if someone needed to inspect the file manually.
  • The JSON manifest is much faster to parse than TOML, ~1.7ms vs ~24ms on my machine.
  • serde_json can out of the box deal with the current Manifest format. Since it uses #[serde(from...]) and skip_serializing, this breaks other formats, such as bincode, rmp_serde (MsgPack) and Postcard. These would produce smaller files and even higher performance than JSON, but they would also require some changes to the Manifest structure.

The manifest is parsed on more places (e.g. make_component_unavailable), but I only modified the DIST_CHANNEL file, since that was the bottleneck that I saw.

Context: #2626

@djc
Copy link
Contributor

djc commented May 22, 2025

Thanks for working on this!

Not sure whether this is intended as just sharing your experiment or whether you actually want to land this.

I feel like we should explore #2626 (comment) a bit before we invest time in this.

@adamreichold
Copy link

I feel like we should explore #2626 (comment) a bit before we invest time in this.

Especially since that reduced list of installed components might be more amenable to a smaller binary format like Postcard compared to the full manifest.

@Kobzol
Copy link
Contributor Author

Kobzol commented May 22, 2025

I mostly wanted to see how much code would have to be changed to use the JSON format, seems like not that much. So I don't think it would be so bad to land as-is.

Doing the other approach would likely require larger code changes, but it might be a better solution overall. I would have to first understand how do the manifests work to implement it though.

@Kobzol
Copy link
Contributor Author

Kobzol commented May 27, 2025

The structure of the code makes it a bit challenging to change this. Essentially, the code first loads the full manifest, then it goes through all its rust components for a given target, and for each of them it checks that it is installed. We would need to store some smaller submanifests split by target (one file per target), which would only contain the rust package to make rustup load less data.

However, while investigating this, I also noticed something interesting. In the "common case", where you just execute rustc without any rust-toolchain.toml override, the list of components and targets to be checked is actually empty. So in that case we load the manifest for no reason, we parse it, go through all the components, and then we do .all() on an empty iterator.

#4350 should fix this.

@Kobzol
Copy link
Contributor Author

Kobzol commented May 27, 2025

I think that this change is no longer worth it now that #4350 has been merged.

@Kobzol Kobzol closed this May 27, 2025
@bjorn3
Copy link
Member

bjorn3 commented May 27, 2025

Wouldn't this still be worth it for the people who do specify components or targets in rust-toolchain.toml? Those are pretty common in my experience.

@Kobzol
Copy link
Contributor Author

Kobzol commented May 27, 2025

It would definitely make the parsing faster for their use-cases. Not sure whether it is worth it to solve that with this specific approach, it seemed like it wasn't very popular with rustup maintainers. I'm of course willing to reopen and finish this if there's interset :)

@djc
Copy link
Contributor

djc commented May 27, 2025

The structure of the code makes it a bit challenging to change this. Essentially, the code first loads the full manifest, then it goes through all its rust components for a given target, and for each of them it checks that it is installed. We would need to store some smaller submanifests split by target (one file per target), which would only contain the rust package to make rustup load less data.

Maybe it would make sense to store a "mini manifest" only for targets that are installed?

@Kobzol
Copy link
Contributor Author

Kobzol commented May 27, 2025

I guess it would be possible, yeah. And if the target file is not on disk, it would be considered to be missing. Note that I'm not sure how "mini" would the manifests be, it looked like the rust component, cat multirust-channel-manifest.toml | grep "pkg.rust\." | wc -l has 5757 entries for the stable toolchain (although many of these are duplicated for multiple targets).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants